Ensemble modeling of black pomfret (Parastromateus niger) habitat in the Taiwan Strait based on oceanographic variables

The location, effort, number of captures, and time of fishing were all used in this study to assess the geographic distribution of Parastromateus niger in the Taiwan Strait. Other species distribution models performed worse than generalized linear models (GLMs) based on six oceanographic parameters. The sea surface temperature (SST) was between 26.5 °C and 29.5 °C, the sea surface chlorophyll (SSC) level was between 0.3–0.44 mg/m3, the sea surface salinity (SSS) was between 33.4 °C and 34.4 °C, the mixed layer depth was between 10 °C and 14 °C, the sea surface height was between 0.57 °C and 0.77 °C, and the eddy kinetic energy (EKE) was between 0.603 °C. According to the statistical findings, SST is merely a small effect compared to SSS, SSC level, and EKE in terms of impacting species distribution. By combining four effective single-algorithm models with no obvious bias, an ensemble habitat model was created. The ranges of 117°E–119°E and 22°N–24°N have the highest annual distributions of S.CPUE and nominal CPUE.


INTRODUCTION
Species distribution models (SDMs) are the most common method of examining species habitat patterns through the use of oceanographic elements; they are also referred to as habitat models, ecological niche models, bioclimatic envelopes, and resource selection functions (Zimmermann et al., 2010;Robinson et al., 2011;Beale & Lennon, 2012;Li & Wang, 2013;Tikhonov et al., 2020). Habitat models use mathematical representations of the current species distribution to predict the future distribution by using an algorithm (Azzellino et al., 2012). Historically, the arithmetic mean and geometric mean models (Xue et al., 2017;Li et al., 2016) based on the habitat appropriateness index have been used. species stock within an ecologically acceptable range, we thus used habitat modeling to evaluate the effects of the maritime environment on the P. niger habitat (Fig. 1). (SDG 14.4). To stop overfishing of P. niger, a thorough study of its biological preferences and habitat regions may be helpful (SDG 14.6).

Fisheries data
We collected information from Taiwanese fishing vessels (mostly coastal sea fishing, with gross register tonnage ranging from 0 to less than 250 tons) about P. niger fisheries from January 2014 to December 2019 from Taiwan's Fisheries Agency. The spatial coverage of the monthly fisheries data was 23 N-26 N and 118 E-120 E, with a resolution of 0.1 . The data provider did not specify whether the reported weights were dry or wet. Various fishing gear was used, and gears data with maximum catch contribution were only used in this study. The collected data included the year, month, latitude and longitude, catch in kilograms, effort in hours, total catch weight in one location, type of fishing gear used, and vessel identification number. Data on fishing depth and gear-soaking time were unavailable. Oceanographic data Seven oceanographic characteristics (Table 1) were gathered from several sources for the current study: sea surface temperature (SST), sea surface salinity (SSS), mixed layer depth (MLD), sea surface chlorophyll (SSC) level, sea surface height (SSH), meridional velocity (U), and zonal velocity (V; Table 1). We calculated eddy kinetic energy (EKE) from U and V as follows: EKE = 0.5 (U 2 + V 2 ). The CMEMS eddy-resolving global ocean reanalysis product GLORYS12V1 (1/12 horizontal resolution with 50 vertical levels; https:// resources.marine.copernicus.eu/product-detail/GLOBAL_MULTIYEAR_PHY_001_030/ INFORMATION) was used to collect SST, SSS, MLD, SSH, U, and V data. Its processing level and coordinate reference system are L4 and W, respectively. In addition, we gathered SSC data using the CMEMS global ocean biogeochemical hindcast product FREEGLORYS2V4 (0.25 horizontal resolution, 75 vertical levels, daily temporal resolution; https://resources.marine.copernicus.eu/product-detail/GLOBAL_ MULTIYEAR_BGC_001_029/INFORMATION), whose processing level and coordinate reference system are Level 4 and ETRS89 (EU-recommended frame of reference for geodata for Europe 1), respectively. These data were originally gathered between January 2014 and December 2019 and covered the geographic range of 116 E-123 E and 21 N-26 N. These data were interpolated using MATBLAB (version 2019a) to a 0.1 spatial resolution to match the fisheries data. The SSC data were interpolated to a monthly temporal resolution using MATLAB in addition to the oceanographic and fisheries data.

Fisheries data standardization
The relative abundance of P. niger was as assessed as the nominal catch per unit effort (N. CPUE) from a total of 55,852 observations as follows (Dunn et al., 2000;Lauridsen et al., 2008): The use of the popular GLM standardization technique and resulting bias-filtered N. CPUE data (Hazin et al., 2007;Hinton & Maunder, 2004;Tian et al., 2009) helped to lessen the effects of spatial data, including latitude (lat) and longitude (long); temporal data (year and month); and interaction factors (i.e., year Ã lat, lat Ã long, and year Ã long; Mondal et al.,  Mondal et al., 2021;Vayghan et al., 2020;Forrestal et al., 2019;Shono, 2004). The key benefits of employing a GLM for standardization include the exponential distribution of response variables and the ability to employ categorical predictors. A stepwise GLM was created using the stats package in RStudio (version 3.6.0) using the aforementioned seven components (year, month, lat, long, year Ã lat, lat Ã long, and year Ã long). The family and procedure employed for GLM optimization were the Gaussian family and glm.fit, respectively. The GLM constructed for standardization was as follows: where the interaction factors are year Ã lat, lat Ã long, and year Ã long.

S.CPUE-oceanographic factor relationship
The correlations between the standardized catch per unit effort (S.CPUE) benchmark values and the aforementioned oceanographic factors were established to discern the preferred parameter ranges. We created suitability index (SI) curves for each oceanographic parameter using summed S with smoothing spline regression. The regression used S.CPUE as the dependent variable and all selected oceanographic elements as the explanatory variables. The SI curves were then normalized as follows using S.CPUE and the oceanographic variables: where Y max and Y min are respectively the maximum and minimum number of observations of S.CPUE or oceanographic factors; thus, SI has a range between 0 and 1, where Y is a simulated or predicted value from Ymax to Ymin. An oceanographic factor range with a large SI value (>0.6) (Mondal et al., 2021;Vayghan et al., 2020;Teng et al., 2021) suggested a favorable range for S.CPUE.

Single-algorithm habitat model development
The current study incorporated four single-algorithm models, namely a GLM, GAM, boosted regression tree (BRT) model, and CART model. Each modeling technique was optimized according to the established protocol. We developed one model for each modeling technique in RStudio and the six oceanographic factors (SST, SSS, MLD, SSH, SSC, and EKE), which were regarded as predictor variables; S.CPUE was the response variable. We used the Gaussian family and the generalized cross-validation of the mgcv package to construct each GAM. We employed the Gaussian family and the glm.fit technique from the stats package to create each GLM. Each BRT model was built using the Gini approach and the Gaussian family from the gbm program; optimization included the use of 100 trees, seven interactions, and 0.65 bag fractions. The rpart package was used to build each CART model using the Gaussian family and the CP technique. The CART models were optimized with a CP value of 0.1 and minimum and maximum node counts of 1 and 6, respectively.

Validation of selected single-algorithm habitat models
The fisheries data set (n = 55,852) was split into two portions using a random splitting technique performed by the RStudio caret package at a ratio of 70 (n(70) = 39,115) to 30 (n (30) = 16,737) to validate the single-algorithm models. For each single-algorithm model, three coefficients-namely the Pearson correlation coefficient (R), root-mean-square error (RMSE), and mean absolute error (MAE)-were computed for both portions of the data set. Little variation in the R, RMSE, and MAE values for the two data sets was considered indicative of a well-performing model with low bias.

Ensemble habitat model development
We created an ensemble habitat model in the RStudio BIOMOD2 package (Georgian, Anderson & Rowden, 2019;Alabia et al., 2016;Reisinger et al., 2021;Tabor & Koch, 2021;Abrahms et al., 2019) to enhance the power to predict the P. niger habitat. A weighted mean ensemble model of the P. niger habitat was created after the performance of the single-algorithm models was assessed. If no discernible bias was detected on the basis of the R, RMSE, and MAE values for the two data sets for a single-algorithm model, the model was integrated into the ensemble model. Models exhibiting potential bias were excluded.
After the creation of the ensemble habitat model, MATLAB was used to visualize the monthly value predictions of the ensemble model along with the S.CPUE for each point in the study area.

Standardization of fisheries data
Over 88% of the catches were captured by otter trawl nets, gill nets, and Taiwanese seines (Fig. 2). Thus only these data were selected for the analysis. The full GLM (with all six factors) resulted in an explained deviance and adjusted R 2 of 18.135% and 0.181, respectively. The residual distribution and quantile-quantile (QQ) plots (Fig. 3) of the full GLM exhibited no significant fluctuation. Thus, the full GLM approach was used for the standardization of the P. niger fisheries data.

S.CPUE-oceanographic factor relationships
The SI curves created for the S.CPUE of P. niger against the six oceanographic factors are illustrated in Fig. 4. When the SI value exceeded 0.6, the ideal SST, SSC level, SSS, MLD, SSH, and EKE ranges were 26.5-29.5 C, 0.  Table 2 presents the performance of different oceanographic factors in the single-algorithm models. SSH was observed to be the most dominant oceanographic factor in all four single-algorithm models. The second most crucial oceanographic factor was EKE, which ranked second in all single-algorithm models except the GLM. SSC ranked third in all single-algorithm models except the GLM. SST, which ranked last in all the models, was deemed the least critical parameter. SSS was ranked fifth in all models but the GLM, in which it was the least influential.

Contributions of single oceanographic factors
Performance and validation of single-algorithm models   (Table 4) models, and the predictions were mapped onto a 1 geographic grid.

Ensemble habitat prediction
Because no discernible bias was detected on the basis of the R, RMSE, or MAE values for the 70% and 30% portions of the data, the produced ensemble was selected for final prediction (Table 5). Figure 6 presents the predicted CPUE (P.CPUE) and S.CPUE. A high annual S.CPUE was distributed primarily in the ranges of 119 E-121 E and 23 N-26 N, the coastal waters of Taiwan. Most S.CPUE values were >4 in these locations but <1 in the remaining study areas. P.CPUE displayed a pattern indicating expansion to 26 N. Both S. CPUE and P.CPUE were between 0.1 and 5.

Spatial distribution
High annual S.CPUE values were observed primarily in the ranges of 119 E-121 E and 23 N-26 N. The P.CPUE values displayed a comparable pattern, with extension to 26 N. This distribution pattern may result from various factors.
First, the Kuroshio Current and coastal currents have boosted species diversity and productivity in the waters near Taiwan (Naimullah et al., 2020a). The Kuroshio Branch Current (KBC), China Coastal Current (CCC), and South China Sea (SCS) Current are three major currents that affect the TS, which is located in the tropical to subtropical western Pacific. These currents influence the fishing grounds and marine habitats of the East China Sea and SCS that border the TS to the north and south, respectively (Naimullah et al., 2022). The KBC provides a favorable environment for the diversified P. niger in the TS. The CCC offers a neritic water mass with low salinity and temperature but high nutrient content because of its connection to the rivers of the Chinese mainland (Shiah et al., 2000). Contrary to popular assumption, the KBC, which is derived from the Kuroshio Current, has high salinity and temperature and a nutrient level comparable to that of the CCC (Chung, Jan & Liu, 2001). These traits produce a water mass with physical characteristics distinct from those of the surrounding water. Properties such as temperature and salinity affect the distribution of P. niger. The trend indicated that South China Sea Water and Kuroshio Branch Water both invaded northward throughout the summer. The summer mean current on the eastern side can reach 90 cm/s in strength.
The southwest monsoon is often lesser than 0.025 N/m 2 in the summer. Such insufficient wind force cannot propel a stream moving at 90 cm/s (Jan et al., 2002). As a result, rather than being driven by local winds, remote forcing with large-scale origin must drive a significant percentage of the circulation. The large-scale forcing is put up in such a way that it drives waters in the northern South China Sea to flow northward and enter the East China Sea through the Taiwan Strait. This might be the one possible reason behind the higher presence of black pomfret mainly on the southwestern coast of Taiwan during April to August. While the windward Kuroshio Branch Current on the eastern side is remotely driven in winter, the China Coastal Current on the western side is driven by the northeast monsoon and this can be the one possible reason behind the higher presence of black pomfret on the northeast coast during September to January. Second, the KBC and CCC both contribute to upwelling. The bottom current in the TS flows upward from the continental slope, and the surface current is primarily driven by wind (Naimullah et al., 2020b). In addition, the eastern side of the TS receives occasional injections of water from the Kuroshio Current. The aforementioned upwelling forces nutrient-rich, typically chilly water to ascend to the surface. The nutrients "fertilize" the surface waters and thus support a high level of biological production (Tang, Kawamura &  Guan , 2004). Consequently, these fertilized TS zones may serve as ideal P. niger fishing locations. In addition to being a hydrological event, upwelling has a major effect on the ecology. The Taiwan Bank upwelling and Dongshan upwelling zones have good alignment with fishing grounds during the summer (Tang et al., 2002) on the west coast and might be the possible reason of higher presence of black pomfret on the south-west coast near Taiwan Bank during summer season. Third, the seafloor of the TS is intricate. The seabed topography and capes influence tidal currents, which form counterclockwise eddies . From the topographic profile, it was noted that (Lin, Juang & Tsay, 2000) Taiwan Strait has a shelf-like topography from northwest to southwest part. In this, the higher tidal amplitude is present on the western part of Taiwan, which was shown as good fishing ground also in the present study. High chlorophyll concentrations outside the estuary are transferred by these tidal currents to the ocean current and attract secondary producers, including fish, crustaceans, and mollusks, and draw out P. niger for harvest. Habitat modeling approach for sustainable development The Kuroshio Current and coastal currents near Taiwan contribute to the diversity and productivity of marine species. As a result, the prevalence of fleet-based fishing operations has grown substantially throughout Taiwan's waters over the past 40 years. The fishing gear used in this region includes purse seines, bottom and pelagic trawls, longlines, and gill and set nets (Fisheries Agency, Council of Agriculture, 2019). However, the trend of overfishing beginning in the 1950s caused catches to peak in 1980 and gradually decline afterward (Chen, Lin & Chuang, 2018;Liao, Huang & Lu, 2019). Despite frequent acknowledgment of the problematic state of coastal and offshore fisheries in Taiwanese waters (Liu, 2013;Chen, 2006;Shao et al., 2011), few fish species have been studied. Notably, the P. niger stocks in the waters close to Taiwan have drastically decreased (Ju et al., 2020). The detailed information provided by habitat or spatial distribution modeling may assist in the sustainable management of P. niger. The pervasive nature of the measurement error inherent to models of species and habitat distribution may render such models unable to contribute to spatial economic optimization for sustainable planning. However, SDMs can potentially serve as heuristic tools for addressing oceanic environmental challenges. We emphasize the contextual application of such models.
Identification of fishing grounds that are underutilized or only partially utilized can be made easier using habitat models (Rowden et al., 2017). The predicted accuracy of single-algorithm models, however, can occasionally be impacted by data changes, leading to unduly optimistic or gloomy predictions. As a result, the current work used an ensemble modeling strategy. We merged and trained several single-algorithm models, often known as weak learners, to address the same issue. Weak learners ultimately produce ensemble models that are more accurate because, despite completing tasks poorly when working alone, they collaborate with other weak learners to become strong learners. The easy identification of fishing grounds crucially enhances fisheries revenue and reduces fishing effort, travel time, fuel consumption, and cost. However, the likelihood of such simplified identification of fishing grounds to result in overfishing highlights the relevance of the SDGs (Mugagga & Nabaasa, 2016). The adoption of SDG 14 has sparked discussion about ocean health and its importance to the future of the planet (Ntona & Morgera, 2018). In here the most important aspect is the conservation (Virto, 2018). Conservation measures can be taken in the overexploited areas and SDM can be used to identify initially the distribution zone of any particular species. Condition of these high or low catch zone can be examined through stock assessment to over or underexploited areas (Kenny et al., 2018). The SDG targets are intended address the major problems threatening ocean resources, such as overfishing and climate change (Cormier & Elliott, 2017;Griggs et al., 2017), but doing so requires emphases on the socioeconomic dimensions of ocean politics and the distinct positions of the least developed countries and small island states. The SDGs have garnered institutional acceptance since their adoption (Friess et al., 2019;Sturesson, Weitz & Persson, 2018). Understanding the habitat of P. niger in the TS may facilitate the sustainable management of the species. The primary aim of SDG 14.4 is biologically sustainable fish stock levels. Habitat modeling can play a crucial role in achieving this goal by identifying the P. niger habitat in the TS. Additionally, SDG 14.5 focuses on conservation in coastal and marine areas. Highly exploited areas can be declared protected areas through temporary fishing prohibition to promote stock sustainability. SDG 14.6 calls for an end to overfishing subsidies. Subsidies for fishing vessels traveling to less-exploited areas should be discontinued to avoid overfishing. The sustainability of the oceans and their resources can also be promoted through the enhancement of scientific understanding, research, and the transfer of marine technology. Related policies should consider the Criteria and Guidelines of the Intergovernmental Oceanographic Commission (SDG 14.a), support small-scale fisheries (SDG 14.b), and implement and uphold international maritime law (SDG 14.c). The modeling of species distribution or habitats may constitute the initial stage in sustainability research (Fig. 7). According to Ju et al. (2020), black pomfret stock in the Taiwan Strait is under collapsed condition and this result was supported by Taiwan Fishery Agency's year logbook. There was decreasing trends in black pomfret fisheries production and fisheries values from 2012 to 2021 with the value of 0.2 million tons and 20 billion NTD, respectively. These implies the importance of present study and we took habitat modelling as the first step for sustainable management of black pomfret fishery of Taiwan Strait. Fisheries management organizations have developed and embraced ecosystem-based management techniques. The ability of oceans to meet the needs of their species is threatened (Neumann, Ott & Kenchington, 2017). As a result, many people may be forced to drastically reduce their demands on ocean ecosystems. The current study identified the detailed habitat preferences and zones of P. niger to further the maintenance of ecologically acceptable levels of species stock (SDG 14.4). A proper understanding of habitat preferences and zones can help to prevent the overfishing of P. niger (SDG 14.6). We plan to conduct future research on the predicted effects of climate change on P. niger through habitat-based modeling and to offer recommendations for sustainability.

CONCLUSION
This study used a variety of oceanographic characteristics to pinpoint the geographic range of P. niger in the TS. Due to the GLM approach's superior performance to other models, we chose it for standardization. Near the SST, SSC level, SSS, MLD, SSH, and EKE of 29.5 C, 0.36 mg/m 3 , 34.2 PSU, 12 m, 0.67 m, and 0.661-0.724 m 2 /s 2 , respectively, the P. niger S.CPUE attained its highest value. According to the statistical analysis of our ensemble model, SST is the least important component and SSH and EKE are the key factors affecting the P. niger distribution. The largest yearly P.CPUE distribution followed by the largest annual S.CPUE distribution were found in the regions of 21 N-26 N and 119 E-121 E.